The current approaches for datacenter power management (workload scheduling, CPU speed control, etc) focus primarily on maintainingthe air temperature surrounding servers to be within the manufacturer specified constraint. This is problematic since several CPUs may still be violating the on-chip thermal constraint thereby leading to reliability loss. The primary objective of this work isto develop a unified approach for datacenter power optimization (by controlling the CPU speeds) which accounts for both the siliconlevel temperature of the VLSI components such as CPUs and the air temperature that directly impacts the reliability of other devicessuch as disks, and also the performance delivered. Our algorithm follows a two step approach: optimally solving a convexapproximation that assigns continuous frequency values to all CPUs and a discretization step for legalization of the assigned frequencies. The experimental results indicate that our method guarantees both on-chip CPU and off-chip air temperature to be within temperature constraints. However, the traditional approach ofconstraining only air temperature will result in on-chip CPU temperature violation on about 40% of the CPUs, or 42% more power consumption to pull the CPU temperature back within constraint by increasing the HVAC cooling.
展开▼